Random mappings designed for commercial search engines

نویسندگان

  • Roger Donaldson
  • Arijit Gupta
  • Yaniv Plan
  • Thomas Reimer
چکیده

We give a practical random mapping that takes any set of documents represented as vectors inEuclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering ofinter-vector inner products. Once represented in the sparse space, it is natural to index documents usingcommercial text-based search engines which are specialized to take advantage of this sparse and discretestructure for large-scale document retrieval. We give a theoretical analysis of the mapping scheme,characterizing exact asymptotic behavior and also giving non-asymptotic bounds which we verify throughnumerical simulations. We balance the theoretical treatment with several practical considerations; theseallow substantial speed up of the method. We further illustrate the use of this method on search overtwo real data sets: a corpus of images represented by their color histograms, and a corpus of daily stockmarket index values.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-Aware Online Commercial Intention Detection

With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually v...

متن کامل

The Random Neural Network Applied to an Intelligent Search Assistant

Users can not guarantee the results they obtain from Web search engines are exhaustive, or that they actually respond to their needs. Search results are influenced by the users’ own ambiguity in formulating their requests or queries as well as by the commercial interest of Web search engines and Internet users that want to reach a wider audience. This paper presents an Intelligent Search Assist...

متن کامل

Discovering Popular Clicks\' Pattern of Teen Users for Query Recommendation

Search engines are still the most important gates for information search in internet. In this regard, providing the best response in the shortest time possible to the user's request is still desired. Normally, search engines are designed for adults and few policies have been employed considering teen users. Teen users are more biased in clicking the results list than are adult users. This leads...

متن کامل

A search quality evaluation based on objective-subjective method

Commercial search engines, especially meta-search engines was designed to retrieve the information by submitting users’ queries to multiple conventional search engines and integrating their partial searching results generated by each search engine. How to find the best search engine for user queries is know as the selection problem of search engines. Recently, the selection problem has become a...

متن کامل

Design Alternatives for Large - Scale Web Search :

Indexing the Web and meeting the throughput, responsetime, and failure-resilience requirements of a search engine requires massive storage and computational resources and a careful system design for scalability. This is exemplified by the big data centers of the leading commercial search engines. Various proposals and debates have appeared in the literature as to whether Web indexes can be impl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1507.05929  شماره 

صفحات  -

تاریخ انتشار 2015